This webinar is an introduction to performing machine learning at scale. An overview of approaches for parallelizing R code on HPC will be provided. We will also cover the essentials of Spark and demonstrate how to use Spark for large-scale data analytics and machine learning. Demonstrations will allow participants to gain practical guidance for building and scaling machine learning workflows.
Scalable Machine Learning
Advanced HPC-CI Webinar Series: Scalable Machine Learning
Remote event
Instructors
Dr. Mai Nguyen
Lead for Data Analytics
Mai Nguyen has extensive industry and academic experience in machine learning, data mining, business intelligence, data warehousing, and software design & development. She is a data scientist at the San Diego Supercomputer Center (SDSC) at the University of California, San Diego (UCSD), where she works on combining machine learning algorithms with distributed computing to process large-scale data. She has worked in many application areas, including remote sensing, personalized medicine, image analysis, and speech recognition. She has M.S. and Ph.D. degrees in Computer Science from UCSD, with focus on machine learning and artificial intelligence.
Dr. Paul Rodriguez
Computational Data Scientist
Paul Rodriguez received his PhD in Cognitive Science at University of California, San Diego (UCSD) in 1999. He spent several years doing research in neural network modeling, dynamical systems simulations, time series analysis, and statistical methods for analysis and predictions in fMRI data. He has more recently worked in data mining for health care fraud identification, and optimization of data intensive network flow models.